{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "结构化数据：NumPy的结构化数组\n",
    "本节介绍NumPy的结构化数组和记录数组，它们为复合的、异构的数据提供了非常有效的存储。尽管这里列举的模式对于简单的操作非常有用，但是这些场景通常也可以用Pandas的DataFrame来实现"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "name = ['Alice', 'Bob', 'Cathy', 'Doug']       \n",
    "age = [25, 45, 37, 19] \n",
    "weight = [55.0, 85.5, 68.0, 61.5]\n",
    "data = np.zeros(4,dtype={'names':('name','age', 'weight'),\n",
    "                                                     'formats':('U10', 'i4', 'f8')})\n",
    "print(data.dtype) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )\n",
      " ('Doug', 19, 61.5)]\n"
     ]
    }
   ],
   "source": [
    "data['name'] = name\n",
    "data['age'] = age\n",
    "data['weight'] = weight\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['name']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('Alice', 25, 55.)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Doug'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[-1]['name']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Alice', 'Doug'], dtype='<U10')"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[data['age']<30]['name']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "生成结构化数组  \n",
    "结构化数组的数据类型有多种制定方式。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype([('name', '<U10'), ('age', '<i8'), ('weight', '<f4')])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.dtype({'names':('name','age','weight'),\n",
    "                        'formats':((np.str_, 10), int, np.float32)})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype([('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.dtype([('name','S10'),(\"age\",'i4'),('weight', 'f8')])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype([('f0', 'S10'), ('f1', '<i4'), ('f2', '<f8')])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.dtype('S10, i4, f8')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](2021-11-02-21-46-20.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "NumPy中也可以定义更高级的复合数据类型。例如，你可以创建一种类型，其中每个元素都包含一个数组或矩阵。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])\n",
      "[[0. 0. 0.]\n",
      " [0. 0. 0.]\n",
      " [0. 0. 0.]]\n"
     ]
    }
   ],
   "source": [
    "tp = np.dtype([('id', 'i8'), ('mat', 'f8', (3,3))])\n",
    "X = np.zeros(1,dtype=tp)\n",
    "print(X[0])\n",
    "print(X['mat'][0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在X数组的每个元素都包含一个id和一个3×3的矩阵。为什么我们宁愿用这种方法存储数据，也不用简单的多维数组，或者Python字典呢？原因是NumPy的dtype直接映射到C结构的定义，因此包含数组内容的缓存可以直接在C程序中使用。如果你想写一个Python接口与一个遗留的C语言或Fortran库交互，从而操作结构化数据，你将会发现结构化数组非常有用！"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "记录数组：结构化数组的扭转  \n",
    "NumPy还提供了np.recarray类。它和前面介绍的结构化数组几乎相同，但是它有一个独特的特征：域可以像属性一样获取，而不是像字典的键那样获取。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([25, 45, 37, 19], dtype=int32)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['age']\n",
    "data_rec = data.view(np.recarray)\n",
    "data_rec.age"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "但是如果每天都需要使用结构化数据，那么Pandas包是更好的选择，我们将在接下来的一章详细介绍它"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "883c2649e32be1aa9ccde59cb670140dd98b9d8177e43f2e5a1f7be5c7e7c586"
  },
  "kernelspec": {
   "display_name": "Python 3.7.10 64-bit ('yolov5': conda)",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
